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Nowadays, online communication is more convenient and popular than face- 
to-face conversation. Therefore, people prefer online communication over 
face-to-face meetings. Enormous people use online chatting systems to 
speak with their loved ones at any given time throughout the world. People 
create massive quantities of conversation every second because of their 
online engagement. People's feelings during the conversation period can be 
gleaned as useful information from these conversations. Text analysis and 
conclusion of any material as summarization can be done using sentiment 
analysis by natural language processing. The use of communication for 
customer service portals in various e-commerce platforms and crime 
investigations based on digital evidence is increasing the need for sentiment 
analysis of a conversation. Other languages, such as English, have well- 
developed libraries and resources for natural language processing, yet there 
are few studies conducted on Bangla. It is more challenging to extract 
sentiments from Bangla conversational data due to the language's 


grammatical complexity. As a result, it opens vast study opportunities. So, 
support vector machine, multinomial naive Bayes, k-nearest neighbors, 
logistic regression, decision tree, and random forest was used. From the 
dataset, extracted information was labeled as positive and negative. 
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1. INTRODUCTION 

People have conversations in their daily life. People express their feelings and opinions in their 
conversations. These feelings and opinions can be categorized into sad, anger, happy, worried, disgusted, 
frightened, complement, motivation, suggestions, and neutral [1]. To detect subjective information such as 
opinions, attitudes, and feelings expressed in text Sentiment analysis or opinion mining aims to use 
automated tools [2]. In our research work we merged them into two main categories of positive and negative 
[3]. Sentiment analysis can be done by capturing both semantic and sentiment similarities among words [4]. 
Our model can identify whether a part of any conversation is positive or negative. These two categories 
expose the sentiment of the people who said it. Analyzing sentiment from people’s speech is a tough job 
because in a single sentence people can express various types of sentiment at the same time. Only the people 
who listen to it, can understand the sentiment properly. Our proposed model can extract sentiment from 
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people’s conversation with a closer accuracy of real life. In this research work we proposed a model that can 
extract sentiment from conversation as positive or negative sentiment. To pursue that we split our dataset into 
80:20 ratio. For training purposes, we used 80% data and for testing purposes we used 20% data. It helps to 
increase the accuracy of the model. Based on the training dataset the accuracy of the model fully depends on 
the training dataset. We have used some techniques such as changing the parameters of machine learning 
models to get more accurate results. We achieved about 86% accuracy on the support vector machine. Rest of 
the algorithms perform closely to the highest accuracy. 


2. LITERATURE REVIEW 

Extracting sentiment from Bangla conversational data is a method for determining if a conversation 
is positive or negative. Bhowmik et al. [5] developed deep learning models for Sentiment analysis on Bangla 
text using an extended lexical data set. They employed the rule-based Bangla text sentiment score system to 
extract polarity from large texts. These polarities, along with the pre-processed text, are then used as training 
samples by the neural network. The pre-processed texts are displayed as a vectorization of words derived 
from pre-trained word embedding models with various word counts. A Word2Vec matrix containing the top 
highest probability word is used as a weighted matrix on the embedding layer to fit the deep learning models. 
This paper also includes a thorough examination of selective deep learning models, as well as some fine- 
tuning. Their proposed hierarchical approach was accurate to the tune of 78.52 percent, 80.82 percent, and 
84.18 percent, respectively. According to Aurpa et al. [6] certain items, such as threats and sexual 
harassment, were more accessible than traditional media. Harassment, vulgarity, personal assaults, and 
bullying can all occur because of extremely toxic internet content. Bangla's use of Facebook has risen in 
recent years due to its status as the world's seventh most spoken language. The use of offensive comments in 
Bangla on Facebook has also grown significantly, but there is little research on the subject. They focus on 
recognizing abusive Bangla language remarks on social media (Facebook) that can be filtered out in the early 
phases of social media attachment in this study. To classify hostile comments quickly and accurately, 
transformer-based deep neural network models were used. They employed pre-training language 
architectures bidirectional encoder representations from transformers (BERT) and efficiency learning an 
encoder that accurately classifies token replacements (ELECTRA). The average accuracy, precision, recall, 
and fl-score were used to assess the proposed models. The results have revealed that our BERT and 
ELECTRA architectures are performing admirably, with test accuracy of 85.00 percent and 84.92 percent, 
respectively. Rahib et al. [7] conducted this study to investigate how Bangladeshis are reacting to and dealing 
with the coronavirus disease (COVID-19) scenario. In this investigation, the status and comments on 
COVID-19 concerns were gathered from multiple Facebook pages and YouTube channels run by reputable 
Bangladeshi news organizations and health specialists. Throughout the study, a variety of machine learning 
algorithms were studied, ranging from conventional algorithms like support vector machine and random 
forest to deep learning algorithms like convolutional neural networks and long short-term memory. 
Experiments were carried out on a 10,581-data-point categorized data set belonging to the authors. When 
evaluating the performance of various models in terms of model assessment, the results demonstrate that long 
short-term memory exceeds all of them, with an accuracy of 84.92 percent. To detect the polarity of textual 
Facebook posts in Bangla containing people's points of view on Bangladesh Cricket, Faruque ef al. [8] 
proposed a sentiment polarity detection approach that uses three popular supervised machine learning 
algorithms: naive Bayes (NB), support vector machines (SVM), and logistic regression (LR). With an 
accuracy of 83 percent when considering n-gram as a feature, LR outperformed SVM and NB. Iqbal et al. [9] 
proposed a four-step process for categorizing six emotions in Bengali literature, including data crawling, pre- 
processing, labelling, and verification, with 7,000 texts labeled into six basic emotion groups. The dataset is 
graded with a score of 0.969. Cohen's score reflects the close collaboration between corpus annotators and 
experts. According to the analysis of appraisal, the distribution of emotion words also follows Zipf's law. The 
BEmoC study's findings were also presented in terms of coding consistency, emotion density, and the most 
utilized emotion words. 

Shetu et al. [10] established a paradigm for parsing text data in paragraphs. To extract sentiment 
from a text, they employed the bag of words method and lexical analysis method. Mamun et al. [11] 
demonstrated that the ensemble approach (i.e., logistic regressiont+random forest+support vector machine) 
with frequency-inverse document frequency (unigram+bi-gram+trigram) features outperformed the other 
classifier models on the developed dataset, achieving the highest accuracy of 82 percent. Most of the 
emotions conveyed on social media platforms are expressed through writing (such as status, tweets, 
comments, and reviews). presents an ensemble-based method for categorizing Bengali textual sentiment into 
positive and negative categories. Because the Bengali sentiment corpus was unavailable, this effort 
additionally created a dataset called "Bengali sentiment analysis dataset". Neethu and Rajasree [12] 
attempted to assess the sentiment of Twitter posts in a particular domain. They suggested a new feature 
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vector that can differentiate between positive and negative sentiment in tweets. In order to examine twitter 
data for sentiment analysis, Jain and Dandannavar [13] used naive Bayes and decision tree machine learning 
methods. Because it is scalable and fast, their proposed model employs Apache Spark. Rahman and Dey [14] 
provide two freely accessible Bangla datasets for sentiment analysis based on aspects. One dataset contains 
user comments regarding cricket that have been human-annotated, while the other features restaurant 
customer reviews. They also presented a fundamental method for analyzing our datasets utilizing the aspect 
category extraction subtask. 


3. RESEARCH METHOD 

Research section will illustrate the overall architecture of our proposed system. The research method 
is listed in Figure 1 as data collection, data pre-processing, model selection, statistical analysis, and its 
implementation will be discussed in this portion. In Figure 1 the full method at a glance is shown. 


Data Collection 
a 


Data Preprocessing 


Model Selection 


Model Training 


Testing Data 


Output 


Figure 1. Method at a glance 


3.1. Data collection procedure 

From various Bangla movies and short film scripts, we collected conversation data for our research 
work. These conversations covered a large scale of topics like food, family, motivation, fraud, business, and 
friends. After analyzing those collected data, we will split it into two categories: positive and negative. We 
have collected about 1,141 data. These conversations include emotions like happy, sad, anger, worried, and 
afraid. These categories help us to differentiate the whole dataset into two main categories of Positive and 
Negative. Among 1,141 data there was 570 data for positive sentiment and for negative it was 571 data. 
Figures 2 and 3 shows the sample dataset. 


Conversation Sentiment 


0 BHA ZN CUS FA OA GA AAS negative 
1 Bert coraa AHS HTS negative 


2 Wars Sy See GA ys SA CETT negative 
3 Beene eye postive 
4 Sra Sah ys a positive 
1136 ot CH Gt en GR? negative 
1137 AEP ore Wes ay positive 
1138 Gt OA ANS (TS RARA? negative 
1139 Sm TAS FRI negative 


1140 GUNA N SCHOEN Sia SAT GRAS CIA CHT DISS negative 


1141 rows x 2 columns 
Figure 2. Sample data 
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Negative Positive 
Sentiment Sentiment 
50% 50% E Positive Sentiment 


m Negative Sentiment 


Figure 3. Class label distribution 


3.2. Data preprocessing and organizing 

Firstly, we collect data from scripts and store them into an xlsx file. The dataset we have collected 
has two attributes. These are positive and negative. As we already discussed, we collect data from movie and 
short film scripts as conversation. Every conversation starts with a single word or single sentence. People can 
express their feelings, emotions, and thoughts through a single word or sentence. To classify these 
expressions into two main attributes we merged happiness, joy, motivation, and thankfulness into positive 
conversations, and for negative conversation we merged sad, anger, backbiting, and worries. During 
pre-processing, we remove punctuation in the first step. In natural language processing, for every language, it 
is essential to identify and remove stop words. For our research work, we have collected Bangla stop words 
and removed them to clean our data. There were about 410 stop words in the Bangla language. For example: 
BSAA, WAG’, JB’, ISB’, IHG, Vy, Aor’, FB’, SY, and CT. Here, Figure 4. shows the python 
code for removing Bangla stop words and punctuations and Figure 5. shows the cleaned data what we 
pre-processed. 


[7]: def process_conversations(Conversation): 
stp = open( ‘bangla _stopwords.txt",'r’,encoding="utf8").read().split() 
result = Conversation. split() 
Conversation = [word.strip() for word in result if word not in stp ] 
Conversation " join(Conversation) 
Conversation = re.sub('[*\u@98@-\u@9FF]",' ',str(Conversation)) 
return Conversation 


Figure 4. Removing stop words and punctuations 


Original: n Bs 
Te oH f e eft? =r, wit PeR re corse 


Cleaned: 

ae at Pahis Yom oaa 
Sentiment:-- positive 
Original: 
OTe N-A ANN aS BOATS Ura 
Cleaned: 
A Bt aver Boars wry 
Sentiment:-- positive 
Original: 
BAR WT SAAT GA GT MAME CNA CET BS 
Cleaned: 
CCN CHCA CPT 
Sentiment:-- negative 


Figure 5. Cleaned data 
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To extract features from each of the conversations, several words and a number of characters are 
needed. Figure 6 shows the result, respectively. After preprocessing procedure label encoding method applied 
to the sentiment column. And then a pickle file generated. The pickle file contains temporary data for reuse 
and also saves time during runtime execution. In this work, our cleaned data is stored as a pickle file for 
upcoming procedures. We need to demonstrate our dataset data where highlights are age, occupation, house 
type, want to switch jobs and we are giving low highlighting to other attributes. In Figure 7, cleaned data 
along with counts of each conversation length and character is shown. 


Word Frequency 


wi se 

2 b ° 

o i ° ° ° 

z eo o eo o e ° 

va eo o om o omo eo ce ow o 

o ee cao coo @ co @ oo o ° e ° o 

© e ee Go amoo cose OGO 0o Gee Oem oo com oo ooa e eco o ee 
2 5 © eae moccemen omo mers G20 Gm G00 0 @ omm @ Cocu C000 os o @ cocese com ooo 


0 200 400 600 800 1000 1200 


No. of Conversation 


Character Frequency 


z 
5 60 
£ 
2 4 
o 
S 
2 20 
0 
1200 
No. of Conversation 
Figure 6. Word frequency and character frequency 
Conversation Sentiment cleaned length no_char 
0 WUT FA CAS SH CNA WT RS negative Wa Sa aA WhO 4 19 
1 OUP COs aE RS negative Woes RS 2 15 
2 Ua GF RET A Fa SA T negative S4 fRA Oa (NITI 4 24 
3 IIR gaa positive TÈ LTA 2 14 
4 Sal! A a g positive yaf 2 13 
1136 S e amy? negative Ram YE 4 16 
1137 SR Sga WSS AI positive AISR ASIA ES 3 23 
1138 Be SH AAS (NF NARA A? negative QRAAS MAGA CA 4 18 
1139 IMAR NAYI negative waft Ge 2 9 
1140 SINA N SACRA OIA CHT AINE CAA CHT GS negative TAI CIA CEAN 3 15 


1141 rows x 5 columns 
Figure 7. Sample of cleaned dataset 
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3.3. Machine learning algorithms and statistical analysis 

About 571 records for positive and 570 records are for negative conversations in our dataset. For the 
dataset splitting purpose we used train-test split function. We followed supervised machine learning 
techniques. To train our model we used 80% of our data and for test 20% of data used. In number, 912 data 
used for trains and 229 data used for test purposes. To know the accuracy on our dataset we applied some 
classifier-based algorithms. These are support vector machine, multinomial naive Bayes, k-nearest neighbors, 
logistic regression, decision tree, random forest, and stochastic gradient descent. In Figure 8, we have shown 
that how we have done our research shortly details. 


| Data Collection 
g Tr 


( Data Preprocessing & Feature Extraction | | Selection of Training & Test Data 


| Applying Machine Learning Algorithms 


a F ‘ 5 
Multinomial Support Vector Stochastic y) 
| Naive Bayes Machine Gradient Descent A- 
vA f z iS / ` f ` 

I/ K-Nearest Decision | Random Logistic 
if Neighbor Tree Forest Regression 
{| \ J \ / } 
\ 

\ 

\ 

N Model Training 


Figure 8. Proposed model structure 


3.3.1. Feature extraction 

We employ machine learning methods here to achieve natural language processing goals. Our model 
is trained by extracting all characteristics of each phrase from two primary characteristics. A method called 
tokenizer is presented here for this technique. Tokenizer divides phrases into words parts. These unique and 
common words have identical properties. In addition, TF-IDF is also such a numerical figure that examines 
the requirement of a term in a text. This approach is used by some important publications for several 
languages. Their success inspired us, and we found that our learning algorithms were the most accurate. 


3.3.2. Classifier algorithms 

It builds numerous decision trees during training. The naïve Bayes classification presupposes that 
there is no connection between the existence of a certain characteristic in a class and the presence of any 
other characteristic. This model is straightforward to create and beneficial for very big datasets in particular. 
Naïve Bayes even exceeds advanced categorization algorithms. The logical regression model may create a 
probability model from a class or event. To decide, for example, one group of images including photographs 
of different animals which may be investigated on a model of various classes. Stochastic gradient descent is 
renowned for improving any method transmitted particularly in machine learning algorithms in order to 
identify associated model parameters for both expected and actual results. 


4. EXPERIMENTAL RESULT AND ANALYSIS 

In this modern era, in intelligent analyzing of data and developing the related smart applications, the 
understanding of IoT [15]-[17], cyber-security [18], in particular, machine learning and deep learning 
[19]-[25] are crucial. According to our requirement, we update our model and dataset using machine learning 
approach. From this modification, we can accomplish that our used classifier is exactly usable for a wide 
range of use according to our dataset. As per our expectations, we achieved 86% accuracy from our proposed 
mode which is a fruitful outcome. This performance of the model creates a path to think about the 
improvement in results. 

The research result was focused to identify whether a conversation is positive or negative. We have 
applied classifiers based on different machine learning models to extract the conversation type. The result has 
two criteria of positive and negative. There were 1141 data for training each of the models. We get various 
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accuracy on different models. Among 7 models the support vector machine and multinomial naive Bayes 
perform well with the highest accuracy. As we already discussed, we collect data from scripts as a 
conversation. All conversations have people's emotions like happy, sad, worried, annoyed, and motivated. 
We merged and categorized them into two main types, positive and negative. The decision-making capability 
of the classifiers was measured by their performance. Accuracy, precision, recall, and F-score were used to 
determine the performance of classifiers. For a classifier, the overall accuracy was considered an adequate 
standard. In the test set, it is necessary to have a notion of the correctly classified samples. 

In Table 1 the accuracy scores obtained for the classifiers built are given. Here it is clear that the 
support vector machine gives the highest accuracy score of 0.85589 and multinomial naive Bayes gives 
almost similar accuracy of 0.8513. That is why it was needed to calculate the other performance measures to 
decide a suitable classifier for our dataset. 

To measure the class agreement of the data labels with the positive labels given by the classifier the 
precision is used. We have to calculate the precision scores for each of the two-class labels because it is 
directly relevant to class labels. In Table 2 the values for each of the classifiers are given along with the 2 
labels we used in this research work. We can see that the classifier random forest gives a score of 0.93 and 
multinomial naive Bayes gives 0.85 for positive conversation. 

To identify class labels recall is known as sensitivity of the measurement that represents the 
effectiveness of the classifier. We also concentrated on achieving a score near 1 for the positive class label. 
The recall scores for two-class labels and classifiers are reported in Table 3. The decision tree and support 
vector machine had a recall score of 0.92 for positive dialogue. Fl-score can be used to determine the 
relationship between positive labels and those provided by the classifier. The harmonic means of precision 
and recall for all two labels across all classifiers can be used to calculate it. The score close to 1 for the 
positive class label was considered when determining the optimum model of classifier. Table 4 shows the F1 
scores for the class labels. Vector machines and multinomial naive classifiers are supported by the classifiers. 
Bayes and stochastic gradient descent are the most effective methods for determining the best classifier for 
our dataset. 


Table 1. Accuracy of classifiers Table 2. Precision of classifiers 
Classifier Accuracy Classifier Precision 
random forest 74.24% Random forest 67.01% 
decision tree 76.42% Decision tree 69.62% 
logistic regression 82.53% Logistic regression 79.23% 
k-nearest neighbors 82.97% K-nearest neighbors 79.39% 
stochastic gradient descent 83.41% Stochastic gradient descent 79.55% 
Multinomial naive Bayes 85.15% Multinomial naive Bayes 85.96% 
Support vector machine 85.59% Support vector machine 81.68% 
Table 3. Recall of classifiers Table 4. Fl-score of classifiers 
Classifier Recall Classifier Fl-Score 
Random forest 96.55% Random forest 79.15% 
Decision tree 94.83% Decision tree 80.29% 
Logistic regression 88.79% Logistic regression 83.74% 
K-nearest neighbors 89.66% K-nearest neighbors 84.21% 
Stochastic gradient descent 90.52% Stochastic gradient descent 84.68% 
Multinomial naive Bayes 84.48% Multinomial naive Bayes 85.22% 
support vector machine 92.24% Support vector machine 86.64% 


Our objective is to predict the mentally hampered individuals with higher precision which was 
achieved by random forest, multinomial naive Bayes, and support vector machine. With remarkable accuracy 
support vector machine, multinomial naive Bayes, and stochastic gradient descent perform well among the 
classifiers as shown in Table 5. Support vector machine, multinomial naive Bayes, and random forest all 
perform well as individual classifiers, as seen in the tables. Support vector machines work well for the 
challenge because our dataset is significantly more condensed, and the labels are poorly understood. 
K-nearest neighbor works effectively since there are fewer dimensions or attributes. The assumption of class 
conditional independence will only work for a large dataset, which is why the decision tree performs poorly 
in this case. 

To avoid over fitting and robustness, it is needed to have a strong correlation over fitting nuts, 
though it is not exceptional. As it is not robust to noise and does not generalize well, future observed data 
decision trees do not work too well. In Figure 9 the overall performance comparison is shown. 
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Table 5. Performance analysis of different algorithms 


Classifier Accuracy Precision Recall Fl-Score 
Random forest 74.24% 67.01% 96.55% 79.15% 
Decision tree 76.42% 69.62% 94.83% 80.29% 
Logistic regression 82.53% 79.23% 88.79% 83.74% 
k-nearest neighbors 82.97% 79.39% 89.66% 84.21% 


Stochastic gradient descent 83.41% 79.55% 90.52% 84.68% 
Multinomial naïve Bayes 85.15% 85.96% 84.48% 85.22% 
Support vector machine 85.59% 81.68% 92.24% 86.64% 
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Figure 9. Performance analysis 


4.1. Prediction 

We have tried to test our model by using a random conversation data and we got a result. 
In Figures 10 and 11, We can see positive and negative prediction conversation. That Mean’s, we can see that 
our proposed model can extract sentiment from Bangla conversation data. 


model = open('cs_svm.pkl','rb') 
svm_model = pickle. load(model) 
Conversation = ‘SINT $ GER MAS CO A PITH CVT A CH Broce FANT AUS AT COT D CHIT DNA AAA... CRT ATS 11 DA HN fofetw 
processed conversation = process_conversations(Conversation) 
if (len(processed_conversation))>@: 
cv,feature_vector = calc_gram_tfidf(dataset.cleaned) 
feature = cv.transform([processed_conversation]).toarray() 
sentiment = svm_model.predict(feature) 
if (sentiment ==@): 
print(f"It is a Negative conversation") 
else: 
print(f"It is a Positive conversation") 
else: 
print("This conversation doesn't contains any bengali Words, thus cannot predict the Sentiment.") 


It is a Positive conversation 


Figure 10. Predicting positive conversation 


model = open(‘cs_svm.pkl’,‘'rb') 
svm_model = pickle. load(model) 
Conversation = ISA CM AGMA S Bas GF als ANAN PRA a Ta Ce GU a oe at cor maa? of... 
processed _conversation = process_conversations(Conversation) 
if (len(processed_conversation))>@: 
cv,feature_vector = calc_gram_tfidf(dataset.cleaned) 
feature = cv.transform([processed_conversation]).toarray() 
sentiment = svm_model.predict(feature) 
if (sentiment ==@): 
print(f"It is a Negative conversation") 
else: 
print(f"It is a Positive conversation") 


else: 
print("This conversation doesn't contains any bengali Words, thus cannot predict the Sentiment.") 


It is a Negative conversation 


Figure 11. Predicting negative conversation 
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5. CONCLUSION 

This research work concludes with an expected outcome using machine learning approach of 
extracting sentiment from Bangla conversation data. Text mining and text analysis are very new terms in 
Bangla language. Though it is a tough task to work with some limitations, lacking the resources we tried to 
overcome these difficulties. Technology makes the communication sector easier with advancement. But 
embracing the advancement by ensuring the control of enormous data is necessary for us. We should be 
concerned about these terminologies to make the world of data more accessible and convenient. 


6. FUTURE WORK 

This research work proposes a methodology that finds the scopes to work with Bangla conversation 
data. To accomplish that, machine learning models were trained from Bangla conversation data and able to 
extract sentiment from those conversations. There is a scope to apply a deep learning approach in our dataset 
to improve efficiency. Here in this work, we extract sentiment as a positive and negative category. But on a 
large scale, people’s emotions, and sentiments as individuals like sadness, anger, neutral, happiness, and fear 
can also be extracted. For real-time conversation data, converting real-time conversations into text and 
analyzing sentiment from these conversations can also be done. However, scope lies in every possible 
opportunity. And opportunity revealed innovation and evolutions. 
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